Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal API and omdb command to expunge a sled #5234

Merged
merged 12 commits into from
Mar 13, 2024

Conversation

jgallagher
Copy link
Contributor

The omdb interface might be overly cautious; feedback welcome. You can try this against omicron-dev run-all; it looks roughly like:

note: using Nexus URL http://[::1]:12221
note: using database URL postgresql://root@[::1]:42285/omicron?sslmode=disable
note: database schema version matches expected (39.0.0)
WARNING: sled b6d65341-167c-41df-9b5c-41cded99c229 IS PRESENT in the most recent inventory collection; are you sure you want to mark it expunged?
WARNING: This operation will PERMANENTLY and IRRECOVABLY mark sled b6d65341-167c-41df-9b5c-41cded99c229 (sim-b6d65341) expunged. To proceed, type the sled's serial number.
sled serial〉sim-b6d65341
expunged sled b6d65341-167c-41df-9b5c-41cded99c229 (previous policy: InService(Provisionable))

This handles the "internal" half of #5134

Copy link
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good overall, thanks for doing it! Just a few comments.

dev-tools/omdb/src/bin/omdb/db.rs Show resolved Hide resolved
log: &slog::Logger,
) -> anyhow::Result<Arc<DataStore>> {
let db_url = self.resolve_pg_url(omdb, log).await?;
eprintln!("note: using database URL {}", &db_url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the logger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's supposed to, but I'm not positive. Probably a question for @davepacheco; I think the logger is only passed into all the omicron types that need one, and output from omdb itself goes to stdout or stderr.

Copy link
Contributor

@sunshowers sunshowers Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I think one way to handle this would be to have some kind of tag passed in as a kv to indicate that the message is user-facing, and in the log handler do filtering based on that. (You can also show such messages in a different manner, e.g. "warning: <text>" instead of [timestamp WARN] text.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of very many things that get emitted directly to stdout/stderr. That's basically how omdb works today. There are definitely tradeoffs to doing it that way but I don't think it makes sense to change just this one and rethinking this approach seems a lot broader than this PR.

dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved
Comment on lines +1097 to +1101
// This is an extremely dangerous and irreversible operation. We put a
// couple of safeguards in place to ensure this cannot be called without
// due consideration:
//
// 1. We'll require manual input on stdin to confirm the sled to be removed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine for now, but I kind of expect we're going to want to write automated tests around this sort of thing (maybe not using omdb, though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. @benjaminleonard made some suggestions about how we might design the external API to mimic this kind of confirmation flow, but I did not implement any of that for the internal API. I put these guards on omdb, but (a) we can relax them (or provide flags to skip them) if needed, or (b) tests may be able to hit the internal API directly.

let opctx = OpContext::for_tests(log.clone(), datastore.clone());
let opctx = &opctx;

// First, we need to look up the sled so we know its serial number.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it won't work for PCs? Or we just won't necessarily be able to do the inventory collection check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't know enough about how PC-based racks are set up to answer this question. I think PCs still have a serial number in the sled table, right? If their sled-agent reports the same serial number, I think both this and the inventory check will work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still work on PCs. This is how we are able to add and remove sleds on the a4x2 testbed.

dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Outdated Show resolved Hide resolved
Comment on lines +1111 to +1116
let (_authz_sled, sled) = LookupPath::new(opctx, &datastore)
.sled_id(args.sled_id)
.fetch()
.await
.with_context(|| format!("failed to find sled {}", args.sled_id))?;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only reason we need the database connection? What about having an API instead to fetch sled info? (This is not a big deal either way.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it both for this and fetching the latest inventory, which doesn't have APIs either (but could). Using the database connection in this subcommand seems a little sketchy to me, but I wasn't sure about adding sled info + inventory internal API commands just to support guard rails. I'll keep this as-is for now but it's easy to revisit (or replace if/when we develop those APIs for some other consumers).

@jgallagher jgallagher merged commit 4f101be into main Mar 13, 2024
22 checks passed
@jgallagher jgallagher deleted the john/expunge-sled-via-omdb branch March 13, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants